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Abstract. A well-known problem in computing some matrix functions iteratively is the lack of 
a clear, commonly accepted residual notion. An important matrix function for which this is the case 
is the matrix exponential. Suppose the matrix exponential of a given matrix times a given vector 
has to be computed. We develop the approach of Druskin, Greenbaum and Knizhnerman (1998) 
and interpret the sought-after vector as the value of a vector function satisfying the linear system 
of ordinary differential equations (ODE) whose coefficients form the given matrix. The residual is 
then defined with respect to the initial-value problem for this ODE system. The residual introduced 
in this way can be seen as a backward error. We show how the residual can be computed efficiently 
within several iterative methods for the matrix exponential. This completely resolves the question 
of reliable stopping criteria for these methods. Further, we show that the residual concept can be 
used to construct new residual-based iterative methods. In particular, a variant of the Richardson 
method for the new residual appears to provide an efficient way to restart Krylov subspace methods 
for evaluating the matrix exponential. 
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1. Introduction. Matrix functions, and particularly the matrix exponential, 
have been an important tool in scientific computations for decades (see e.g. \12\ \13\ 
[T51 [TUl [H]). The lack of a clear notion for a residual for many matrix functions 
has been a known problem in iterative computation of matrix functions [H llOi 137] . 
Although it is possible to define a residual for some matrix functions such as the 
inverse or the square root, for many important matrix functions including the matrix 
exponential, sine and cosine, no natural notion for residuals seems to exist. 

Assume for given A G R"^", such that A + A* is positive semidefinite, and v £ M" 
the vector 

y ~ cxp{—A)v (1-1) 

has to be computed. The question is how to evaluate the quality of an approximate 
solution 

yfc«exp(-A)v, (1.2) 

where k refers to the number of steps (iterations) needed to construct yk ■ We interpret 
the vector y as the value of a vector function y{t) at t = 1 such that 

y'{t)^-Ayit), y{0)^v. (1.3) 

The exact solution of this initial- value problem (IVP) is given by 

y{t) — exp{—tA)v. 
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Table 1.1 

The linear system and matrix exponential residuals. In both cases the sought-after vector is 
f{A)v, with either f{x) = 1/x or fix) = exp(— a;). The error is defined as the exact solution minus 
the approximate solution. 



f{x) l/x exp(-x) 



define = exp(-tA)w, 

exact solution y y = A v 

set y := y{l) 
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Assuming now that there is a vector function ykit) such that ?/fe(l) — yk, we define 
the residual for yk{t) ~ y{t) as 

rk{t) = -Aykit) -y'kit). (1.4) 

The key point in this residual concept is that y — expi~A)v is seen not as a problem 
on its own but rather as the exact solution formula for the problem (II. 3p . The 
latter provides the equation where the approximate solution is substituted to yield 
the residual. We illustrate this in Table [TTTl where the introduced matrix exponential 
residual is compared against the conventional residual for a linear system Ay = v. 
As can be seen in the Table, the approximate solution satisfies a perturbed IVP, 
where the perturbation is the residual. Thus, the introduced residual can be seen as a 
backward error (see Section |4] for residual-based error estimates) . If one is interested 
in computing the matrix exponential exp(— yl) itself, then the residual can be defined 
with respect to the matrix IVP 

X'it) = -AXit), XiO) = /, 

with the exact solution Xit) = expi—tA). Checking the norm of rkit) in (|1.4p is 
proposed as a possible stopping criterion of Krylov subspace iterations first in [4l and 
more recently for a similar matrix function in |22| . 

The contribution of this paper is twofold. First, it turns out that the residual (jl.4p 
can be efficiently computed within several iterative methods for matrix exponential 
evaluation. We show how this can be done in several popular Krylov subspace and 
Chebyshev polynomial methods for computing expi—A)v. Second, we show how the 
residual notion leads to new algorithms to compute the matrix exponential. Two basic 
Richardson-like iterations are proposed and discussed. When combined with Krylov 
subspace methods, one of them can be seen as an efficient way to restart the Krylov 
subspace methods. Furthermore, this approach for the matrix exponential residual 
can be readily extended to the sine and cosine matrix functions (see the conclusion 
section). 

The equivalence between problems (jl.2p and (jl.3p has been widely used in nu- 
merical literature and computations. In addition to already cited work [4j [22] see 
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e.g. the very first formula in ^TT} or [131 Section 10.1]. Moreover, methods for solv- 
ing (|1.2p are applied to (II. 3p (for instance, exponential time integrators [TS1[T1]) and 
vice versa [371 Section 4]. In [37], van den Eshof and Hochbruck represent the error 
ek{t) = y{t) - yk{t) as the solution of the IVP e^(t) = -Aekit) + rkit), £^(0) = and 
obtain an explicit, non-computable expression for ek(t). This allows them to justify a 
stopping criterion for their shift-and-invert Lanczos algorithm, based on stagnation of 
the approximations. Although being used, especially in the field of numerical ODEs 
(see e.g. 9, i331 US the exponential residual (|1.4p does seem to have a potential 
which has not been fully exploited yet, in particular in matrix computations. Our 
paper aims at filling this gap. 

The paper is organized as follows. Section [5] is devoted to the matrix exponential 
residual within Krylov subspace methods. In Section |3] we show how the Chebyshev 
iterations can be modified to adopt the residual control. Section |4] presents some 
simple residual-based error estimates. Richardson iteration for the matrix exponential 
is the topic of Section [5l Numerical experiments are discussed in Section [6l and 
conclusions are drawn in the last section. 

Throughout the paper, unless noted otherwise, || • || denotes the Euclidean vector 
2-norm or the corresponding induced matrix norm. 

2. Matrix exponential residual in Krylov subspace methods. Krylov sub- 
space methods have become an important tool for computing matrix functions (see 
e.g. [38ll5l[23llIIl[3ll|6l[ni[71 [18]). For A e M"^" and v € R" given, the Arnoldi 
process yields, after k steps, vectors wi, . . . , Vk+i G M" that are orthonormal in exact 
arithmetic and span the Krylov subspace ICk{v, Av, . . . , A''~^v) (see e.g. [13l [32l l39]). 
li A = A* , the Lanczos process is usually used instead of Arnoldi. Together with the 
basis vectors Vj , the Arnoldi or Lanczos processes deliver an upper-Hessenberg matrix 
Hk G ]R('=+i)x'=, such that the following relation holds [13l|32l[39]: 

AVk = Vk+iKki or equivalently, 
AVk = VkHk + hk+iMVk+iel, 

where Vk £ M" ^ has columns vi, . . . , Vk, Hk G K'^ ^ is the matrix H_k without the 
last row (0, ■ • • ,0, hk+i,k), and Cfc = (0, • • • ,0, 1)^ G M'"'. The first basis vector vi is 
the normalized vector v: vi — v/\\v\\. 

2.1. Ritz-Galerkin approximation. An approximation yk to the matrix ex- 
ponential y = cxp{—A)v is usually computed as yk{^), with 

yfc(t) = T4exp(-iiJfe)(/3ei), (2.2) 

where (3 — \\v\\ and ei — (1,0, .. . ,0)"^ G K'^. An important property of the Krylov 
subspace is its scaling invariance: application of the Arnoldi process to tA, t G M, 
results in the upper-Hessenberg matrix tHf. . and the basis vectors vi, . . . , Vk+i are 
independent of t. It is convenient for us to write 

Vkit) = VkUkit), Uk{t) = exp(-ti/fe)(/3ei), (E2I) 

with Uk{t) : M M'' being the solution of the projected IVP 



u'k{t) ^ -HkUk{t), Ufc(0)=/3ei. 
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(2.3) 



The following simple Lemma (cf. formula (29)]) provides an explicit expression for 
the residual. 

Lemma 2.1. Letykit) w y{t) = exp(— tj4)w be the Krylov subspace approximation 
given by 112. 2\) . Then for any t ^ the residual rk{t) for yk(t) « y{t) is 

Tk{t) = -/3hk+i,kel exp{~tHk)eiVk+i, 
lkfc(OII = IPhk+i.kel exp{-tHk)ei\ = \hk+i,k[uk{t)]k\, 

where [uk(t)]k is the last entry of the vector function Uk{t) defined in \2.2\ ). 

Proof It follows from that y^(t) = -VfciJfe exp(-ti/fe)(;3ei). From the 

Arnoldi relation (|2.ip we have 

Aykit) = AVkexp{~tHk){l3ei) = {VkHk + hk+i.kVk+iel) exp{-tHk){l3ei), 

which yields the result: 

rk{t) = -Ayk{t) - y'k{t) = -ft.fc+i,feWfe+ief exp(-ii7fe)(/3ei). 

□ 

Note that the Krylov subspace approximation (|2.2p satisfies the initial condition 
yfc(O) = w by construction: 

yk{0) = Vk{Pei)= f3vi=v. 

Thus, there is no danger that the residual rk{t) = —Ayk{t) — y'k{t) is small in norm 
for some yk{t) approaching a solution of the ODE system y' = Ay with other initial 
data. 

The residual notion (|1.4p allows us to see (|2.2p as the Ritz-Galerkin approxi- 
mation: the residual vector rk{t) is orthogonal, for any i ^ 0, to the search space 
span(ui, ...,Vk): 

Vk*rk{t) = VC{-Ayk{t)-y'k{t)) = VC{~AVkUk{t) ~Vku',{t)) = -HkUk{t) ~ u',{t) - 0. 

(2.4) 

Here we used the relation V^AVk = Hk, which follows from (|2.1[) if Vk is orthonormal 
(this may not always be the case in floating point arithmetic). 

The residual rk{t) turns out to be closely related to the so-called generalized 
residual pk{t) [18^. Following [18] (see also [31]), we can write 

yk{t) = PVk exp(-tiJfc)ei ^^^f^^^^ + ^^fc)"VeidA, 

y{t) = exp(-tA)w = -J-^ (f e^(A/ + tA)-\dX, 

where F is a closed contour in C encircling the spectrum of A. Thus, yk{t) is an 
approximation to y{t) where the resolvent inverse {\I -\-tA)^^v is approximated by k 
steps of the fully orthogonal method (FOM): 

tk = y{t) - yfc(i) - ^ e^error^O^dA. 

Since the FOM error is unknown, the authors of [TS] replace it by the known FOM 
residual, which is (3{—thk+i^k)vk+i&k{XI + tHk)~^ei. This leads to the generalized 
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residual 

Pk{t) = i e^f3{-thk+i,k)vk+iel{\I + tHk)-^eiA\ ^ ^ 

27ri Jy (2.5) 

= -I3thk+i^kel exp{-tHk)eiVk+i, 

which coincides, up to a factor t, with our matrix exponential residual rk{t). For the 
generalized residual, this provides a justification which is otherwise lacking: strictly 
speaking, there is no reason why the error in the integral expression above can be 
replaced by the residual. In Section 16.41 a numerical test is presented to compare 
stopping criteria based on rk{t) and Pk{t)- 

2.2. Shift-and-invert Arnoldi/Lanczos approximations. In the shift-and- 
invert (Sal) Arnoldi/Lanczos approximations [28t i37) the Krylov subspace is built up 
with respect to the matrix (/ + 7^!)"^, with 7 > being a parameter, so that the 
Krylov basis matrix Vfe+i G M^^C^+i) and an upper- Hessenberg matrix G mC^+i)^*^ 
are built such that (cf. p.ip ) 

(/ + 7^)" Vfe = Vk+iKk-, or, equivalently, 
(/ + -iA)-^Vk = VkHk + hk+i,kVk+iel, 

where Hk G M'^^'^ is the first k rows oi H_f.. The approximation yk{t) ~ exp{—tA)v is 
then computed as given by (|2.2p . with Hk defined as j37] 

Hk = -iH^'-I). (2.7) 

7 

Relation (|2.6p can be rewritten as (cf. formula (4.1) in [3 7) ) 

AVk = VkHk - + ^A)vk+ielH^\ (2.8) 

7 

which leads to the following lemma. 

Lemma 2.2. Let yk{t) k, y{t) ~ cicp{—tA)v be the Sal Krylov subspace approx- 
imation h2.2fl . with Hk defined in {2.1^ . Then for any t ^ the residual rk{t) for 

yk{t) ~ yit) is 

Tk (t) = /? ^^^^ elH-^ eM-tHk)ei (/ + "fA)vk+i , 
7 



iTkim^p 



k+l.k 
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\elH-'eM~tHk)e,\il+j\\A\ 



Proof. The proof is very similar to that of Lemma 12. II Instead of the conventional 
Arnoldi relation (12. ip . relation (12. St should be used. □ 



2.3. Error estimation in Krylov subspace methods. If yk{t) is a Krylov 
subspace approximation to y(t) = cxp{—tA)v, the error function efe(t) = y{t) ~ yk{t) 
satisfies the IVP 

e'kit) = ~Aek{t) + rfc(t), 6^(0) - 0. (2.9) 

To estimate the error, this equation can be solved approximately by any suitable time 
integration scheme; for example, by Krylov exponential schemes as discussed e.g. in 
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[TTl Section 4] or [IB]. The time integration process for solving (|2.9|) can further 
be optimized to take into account that the residual function rk{t) depends on time 
as rk{t) — 'ipk{t)vk+i with Vk+i — const and ipk{t) being a scalar function of t (see 
Lemma 12. 1[) : 

Van den Eshof and Hochbruck 37 propose to get an error estimate by replacing in 
Skit) = y{t) ~ yk{t) the exact solution y{t) with the same continued Krylov process 
approximation yk+mit): 

ek{t) ~ yk+mit) - 2/fe(t) = Vk+mUk+mit) - VkUkit) = Vk+mekit), 

\\ekm^rekm^hk+ut)-ukit)i ^ ■ ' 

where 

VkUkit) ^ Vk+mUkit), Ukit) = [iukit))^,0,...,Of 



and Ukit) and Uk+mit) are the solutions of the projected IVP (|2.3I) obtained with 
respectively k and k + m Krylov steps. It is not difficult to see that in this case 
ikit) = Uk+mit) — Ukit) is the Galerkin solution of ()2.9|) with respect to the subspace 
colspanVfe+m- Indeed, we have 

y'k+m ^ -~M)k+rn - rk+mit), yk+mit) = Vk+mUk+mit), 

y'k ^ -Ayk - rkit), ykit) ^ Vk+mUkit). 

Subtracting y'f, from and multiplying the result from the left by Vj*^^ (in as- 

sumption the orthonormality of Vk+m is not spoiled in floating point arithmetic) we 
obtain 

iuk+mit)-Ukit))' = -Hk+miuk+mit)-Ukit)) + Vk+„Jkit), Vk+rn^kit) = 1pkit)ek+l, 

and we arrive at the projected IVP 

i'kit) ^ ~Hk+mhit) + Mt)ek+i. (2.11) 

where ek+i is the (fc + l)th basis vector in This shows that error estimation 

by the same continued Krylov process is a better option than solving the correction 
equation (|2.9p by a new Krylov process: the latter would mean that we neglect the 
built up subspace. In fact, solving IVP (|2.9p by another process and then correcting 
the approximate solution ykit) can be seen as a restarting of the Krylov subspace. 
We explore this approach further in Section [5j 

3. Matrix exponential residual for Chebyshev approximations. A well- 
known method to compute ymit) ~ exp(— i^)u is based on the Chebyshev polynomial 
expansion (see for instance [351130] ): 



ymit) = Pmi-tA)v 



Y^CkTki~tA) + ^I 

.fe=l 



(3.1) 



Here we assume that the matrix tA can be transformed to have its eigenvalues within 
the interval [—1,1] C M (for example, A can be a Hermitian or a skew-Hermitian 
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matrix). Here, are the Chebyshev polynomials of the first kind, whose actions on 
the given vector v can be computed by the Chebyshev recursion 



To{x) = f, Ti{x) = X, Tk+iix) = 2xTk{x) - Tk^i{x), fc = 1,2, , 
and the coefficients Ck can be computed, for a large M, as 



Ck 



^ mYI exp(cos(6'j)) cos(fc6ij), fc = 0, 1, 



,m. 



M 



(3.2) 



(3.3) 



which means interpolating exp(x) at the Chebyshev polynomial roots (see e.g. |30i 
Section 3.2.3]). This Chebyshev polynomial approximation is used for evaluating 
different matrix functions in [2]. 

The recursive algorithm p.l|) - p.3|) can be modified to provide, along with ym{t), 
vectors y!^{t) and Aymit), so that the exponential residual rm{t) = —Aymit) — y'mit) 
can be controlled in the course of the iterations. To do this, we use the well-known 
relations 



T!,{x) = kUk-i{x), 
1 



xTk {x) 
xUk{x) 
Tk{x) 



{Tk+i{x) +Tk-i{x)), 
{Uk+i(x) + Uk^i{x)), 
iUkix)-Uk^2ix)), 



where fc 1, 2, . . . and Uk are the Chebyshev polynomials of the second kind: 
Uoix) = l, Uiix)=2x, Uk+i{x)=2xUk{x)-Uk-i{x), fc = l,2,.... 



(3.4) 
(3.5) 

(3.6) 

(3.7) 

(3.8) 



For dSUl) to hold for fc = 1 we denote C/_i(x) = 0. From (|3ll]) , dSH) and (jSH) it follows 
that 



.k=l 



Y^2t^{Uki~tA) + Uk-2i-tA)) 



.fc=i 



2t 



(3.9) 



m = 1,2, . 



Similarly, from p.ip . p. 51) and p. 71) . we obtain 



-Aym{t) 



^ |(T,+i(-tA) + Tk.,{-tA)) '-^A 



Co 



J2 f^{Uk+i{-tA) - Uk-si-tA)) ^A 



.k=l 



(3.10) 



V, m 



1,2, 



where we define U-2{x) = — 1. 

The obtained recursions can be used to formulate an algorithm for computing 
ym.{t) ~ exp{—tA)v that controls the residual rm{t) = —Aym{t)—y',j^{t), see Figure IXTl 
Just like the original Chebyshev recursion algorithm for the matrix exponential, it 
requires one action of the matrix A per iteration. To be able to control the residual, 
more vectors have to be stored than in the conventional algorithm: 8 instead of 4. 
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u_2 := —V, U-i :— 0, Uq :— v, Ui :— (—2 *t) * (A* v) 
compute Co 

y := (0.5 * Co) * Mo, y' := 0, minusAy (co/t/4) * ui 

for fc = 1, . . . ,iVmax 

U2 :~ (—2 *t) * {A* ui) — uq 
compute Ck 

y :=y + (cfc/2) * [ui - ii_i) 

y' := y' + (c^ * k/2/t) * {m + u_i) 

minusAy := minusAy + (cfc/4/i) * (u2 — u_2) 

u_2 := 

ui Mo 

uo Ml 

Ml := M2 

resnorm := ||minusAy — y'\\ 
if resnorm < toler 

break 
end 

end 

Fig. 3.1. Chebyshev expansion algorithm to compute the vector yN^^^^it) ~ exp{—tA)v. The 
input parameters are A g R"^", v S K", t > and toler > 0. It is assumed that the eigenvalues A 
oftA satisfy -1 A s£ 1. 

4. Residual-based error estimates. By definition of the residual (|1.4p . the 
approximate solution yk{t) ~ exp{—tA)v is the exact solution of the problem 

y',{t) = ~Aykit)^rk{t), y{0) ^ v, (4.1) 

which is a perturbation of the original problem (|1.3p . Therefore the residual (t) can 
be seen as the backward error for ytit). From (j4.ip and (|1.3p it is easy to see that 
the error ek{t) satisfies the initial- value problem 

e'kit) = -Ackit) + Tkit), efc(O) = 0, (4.2) 

with the exact solution 

efe(t)= / exp{{s-t)A)rk{s)ds. (4.3) 
Jo 

This formula can be used to obtain error bounds in terms of the norms of the matrix 
exponential and the residual [40] . 

Lemma 4.1. Let \A\ denote a matrix whose entries are absolute values of the 
entries of A. Let rk{t) : M — ?> M" be continuous for t ^ and fk{t) be a vector- 
function with entries 

[rk{t)]i = max |[rfc(s)]i|, i^l,...,n. 

s£[0,t] 
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It holds for any t ^ that 

\\ekit)\U ^ \\\t^i-tA)\fkm* ^ \\\M-tA)\\\4fkit)\U 



(4.4) 



with II • II* being any consistent matrix (vector) norm and ip{x) — (exp(a;) — 
Note that, for any B G M"^", |||v3(i3)|||* = ||cp(S)l|* for the 1- and oo- norms. 

Proof. For simplicity of notation, throughout the proof we omit the subindex 
and write ek{t) — t{t) and rk{t) — r{t). Denote, for a fixed i, the entry (?,j) of the 
matrix exp((s — t)A) by eij[s). Entry i of t{t) can be bounded as 



\wu = 



t n 

^ey(s)[r(s)]jds 
t 

e,j{s)[r{s)]jds 



V / e^j{s)[r{s)]jAs 



(4.5) 



For any fixed t 0, e.y (s) is an infinitely differentiable function in s, represented by 
the uniformly convergent power series 



I +{s-t)A + --- + 



kl 



A" 



Therefore, on the interval [0,t] the function ey (s) changes its sign a finite number of 
timefl Denote the points where ey (s) changes its sign in [0,t] by ti, t2, . . . , tm-i, 
with m depending on the indices i and j of eij{s). Setting to = and tm — t, we 
obtain 



eij(s)[r(s)]jds = ^ / eij(s)[r(s)]jds, 



1=1 -"^'-1 



where eij (s) is either nonnegative or nonpositive on each of the subintervals [ti-i,ti\. 
Since the function [^(s)]^ is continuous, a version of the mean-value theorem for the 
integral [42l Theorem 5, Section 6.2.3] can be applied to each of the integrals under 
the summation: 



e»j(s)[r(s)]jds = ^[r(^/)]j / e^j{s)(ls, 



1=1 



ti-i 



with = to Ci =S= ^ ^2 ^ ^2 • ■ • =S= tm-i =S= i,m ^ tm = t- This yields the bounds 



711 „t 



1 = 1 •'^'-^ 
ft 



m „t 



gminjr(s)]j^ /. ey (s)dssC j (s)[r(s)]jds s: maxjr(s)]j ^ j ey (s)ds 







t 

e^J{s)[r{s)]jAs 





1=1 -"^'-1 
ft 



min [r(s)]j / ey(s)ds ^ / e,y(s)[r(s)]jds ^ max [r(s)]j / ey(s)ds. 



^ max 



mill [r(s)]. 



max [r(s)]. 



(s)ds 



max |[r(s)]j| = [r{t)]j 



^ This follows from the fact that a smooth function can have only a finite number of isolated roots 
on a bounded interval. 
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Evaluating 



ds = [tif{-tA)],j 
and substituting 

\[t^{-tA)],, \ [f{t)]j 

into (1431) yields 

n 

□ 

The estimates provided by the last lemma can be specified further if more infor- 
mation on A is available. For symmetric positive definite A in the 2-norm holds 

\\ek{t)\\i^t\\fk{t)\l t^O. 

This estimate appeared in [H formula (32)]. If the eigenvalues of \A\ are known to 
lie in the interval [Amin, Amax], with Amin > 0, then 

\\\M^tA)\\\ \\M-tA)\\ = \M-tx^in)\, 

where A is a diagonal matrix with the eigenvalues of |^| as its entries. 

5. Richardson iteration for the matrix exponential. The notion of the 
residual allows us to introduce a Richardson method for the matrix exponential. 

5.1. Preconditioned Richardson iteration. Consider the preconditioned Ri- 
chardson iterative method 

Xk+i = Xk + M^^Tk (5.1) 

for solving a linear system Ax ~ b, with the preconditioner M « A and residual 
rk = b — Axk- Note that M^^r^ is an approximation to the unknown error A^^rk- 
By analogy with (jS.ip . one can formulate the Richardson method for the matrix 
exponential as 

yk+i{t) = ykit) + hit), (5.2) 

where « ek is the approximate solution of the IVP (12.91) . One option, which we 
follow here, is to choose a suitable M ^ A and let ik be the solution of the IVP 

e'kit) = -Mikit) + rk{t), efc(O) - 0. (5.3) 

Just as for solving linear systems, M has to compromise between the approximation 
quality M sa A and the ease of solving (|5.3p . 

In fact, the exponential Richardson method can be seen as a relative of the wave- 
form relaxation methods for solving ODEs, see e.g. [IHl HO]. The key difference of 
method (|5.2p - (|5.3p from the waveform relaxation methods is that the latter are merely 
time-stepping methods. In particular, in waveform relaxation methods relation (|5.3p 
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,(s)ds 



I + {s-t)A + ^ 



+ 



k\ 



,{s)[r{s)]jds 



is not solved as such but replaced by a discretization, e.g. by a linear multistep inte- 
gration formula. 

The residual rk{t) of the Richardson iteration (|5.2p - (|5.3p can be shown to satisfy 
the following recursion. From (j5.2[) and (j5.3p we have 

-y'k+iit) = -y'kit) + Mh{t) - Tkit). 

Subtracting relation Ayk+i{t) = Ayk{t) + Alk{t) from this equation, we get 

rk+i{t) - -y't,{t) + Mik{t) - Tkit) - Aytit) - Ah{t) = (Af - A)h{t). (5.4) 
Taking into account that 

ik{t)= f exp((s -i)M)rfc(s)ds, 

we obtain (cf. [26]) 

rk+i{t) - {M - A)ik{t) = {M ~A) I exp((s - t)M)rk{s)ds. (5.5) 



The residual (t) ~ 
satisfies for any t ^ 



Using relation (14. 4p . we arrive at the following result. 

Lemma 5.1. Let \A\ and fkit) he as defined in Lemma 
~y'ki^) ~ Mjk{t) in the exponential Richardson method ([5 



||rfe+i(t)|U < \\\t{M - A)ip{~tM)\fk{t)\\, 

^ \\\t{M ^ AM-tM)\\\4fk{t)\U, 
so that max ||rfe+i(s)|L ^ max |||s(M — A)93(— sAf)||L max |kfe(s)|L, 

s£[0,t] s£[0,t] s£[0,t] 

with II • II* being any consistent matrix (vector) norm and (p{x) — (exp(x) — l)/x. 
Proof. The proof follows the lines of the proof of Lemma 14.11 □ 
The estimate provided by the lemma shows that, at least for some matrices A 
and AI and not too large t ^ 0, the exponential Richardson iteration converges faster 
than the Richardson iteration for linear system solution. Indeed, since 

t{M - A)ip{-tM) = (M - A)M-'^{I - exp(-iM)), 

the upper bounds for the residual reduction are 

linear system Richardson: ||rfe_|_i||* < ||(A/ — yl)M^"^||*||rfc||*, 
exponential Richardson: ||rfe+i(i)||* |||(M - A)M^^{I - exp(-tM))|||*||ffe(t)||*, 

with f ^ in the second inequality (both sides of the inequality are zero for t — 0). 
For general matrices A and M it is hard to prove that 

|||(M - A)M-\I - exp(-a/))||| II (Af - A)M-^\\, t ^ 0. 

This inequality holds in the 2-norm, for instance, if A is an Af -matrix and M is 
its diagonal part (in this case the matrices A/ — A, A/~^ and / — exp{—tM) are 
elementwise nonnegative and we can get rid of the absolute value sign). As can be 
seen in Figure 15.11 exponential Richardson can converge reasonably well even when 
||(Af — ^)Af~^|| is hopelessly close to one. 
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Fig. -5.1. Upper bounds for residual reduction in Richardson iteration for the linear S[ 
tern (dashed) and matrix exponential (dash- dotted). A = tridiag(— 1, 3, — 1) (top) and A 
tridiag(-l, 2, -1) (bottom). In both cases A G RioOxlOO^ ^ diag(A). 



An important practical issue hindering the use of the exponential Richardson 
iteration is the necessity to store the vectors rk{t) for different t. To achieve a good 
accuracy, sufficiently many samples of (t) have to be stored. Our limited experience 
indicates that the exponential Richardson iteration can be of interest if the accuracy 
requirements are relatively low, say up to 10~^. In the experiments described in 
Section lOl iust 20 samples were sufficient to get the residual below tolerance 10~^ 
for a matrix of size n = lO**. 

5.2. Krylov restarting via Richardson iteration. In the exponential Ri- 
chardson iteration (|5.2I) the error ek{t) does not have to satisfy (|5.3I) . which is just one 
possible choice for efc(i). Another choice is to take efe(t) to be the Krylov approximate 
solution of the IVP 



12 



(5.6) 



If the approximate solution yk{t) is also obtained by a Krylov process, then the 
Richardson iteration (j5.2p . (|5.6I) can be seen as a restarted Arnoldi/Lanczos method 
for computing exp{—tA)v. Indeed, assume that the IVP (j5.6p is solved approximately 
by m Arnoldi or Lanczos steps, so that the next Richardson approximation is 

yk+rn{t) = Vkit) + hit). (5.7) 

Assume yk{t) is the Krylov or Sal Krylov approximation to exp{—tA)v, given by (|2.2p . 
(|2.ip or by (|2.2I) , (|2.7I) respectively. To derive an expression for rk+m (t) , we first notice 
that 

rk{t)^Mt)wk, Vfc:M^M, «;fc = const G R", (5.8) 
with a scalar function ^pk (t) and a constant vector Wk ■ These are given by 

ipk{t) = -/3hk+i,kel exp{-tHk)ei, Wk = Wfc+i 
for the regular Krylov approximation (see Lemma |2. II) and by 

Mt) = P ^^^^^ elH^\M~tHk)eiiI + -rA)vk+u Wk = iI + -/A)vk+i 
7 

for the shift-and- invert Krylov approximation (see Lemma l2.2p . The error 

f-k{t) ^ y{t) - Vkit) ^ / exp((s - <)yl)rfe(s)ds = / ■(/'fc(s) exp((s - t)A)u;fcds 
Jo Jo 

is approximated by the m-step Krylov solution ek{t) of (|5.6p : 

h{t) ^ / V'fc(s)Fmexp((s - t)^™)||wfe||eids 
Jo 

- /■* - (5 9) 

= Vm exp((s - t)i7,„)'0fc(s)||wfc||eids. 



■"(*) 

where ei = (1, 0, . . . , 0)'^ G R™ and y,„ e R"^™ and e ]Rmxm ^.gg^j^- fj.^^ ^ 
steps of the Arnoldi/Lanczos process for the matrix A and the vector Wk- It is not 
difficult to see that u{t) is the solution of the IVP 

u'it) - -H„,uit) + Mt)\\wk\\ei, uiO) - 0. (5.10) 

From ([5J|) and (|5.10p . we have 

rk+,n{t) = -y'k+m{t) - Ayk+,nit) = -y'k{t) - e'kit) - ^Vkit) - Aik{t) 
= rk{t) - V„,u'(t) - AV,nu{t) = rk(t) - V„,(-H,nu(t) + Mt)\\wk\\ei) - AV„,u{t) 
= rk{t) + V,nH„,u{t) - Mt)\\wk\\V„^ei ~AV„,u{t) = iV„,H„, - AV„,)u{t). 

rkit) 

(5.11) 

If V and H result from the conventional Arnoldi/Lanczos process, then (cf. (|2.ip ) 

VmHm - AVm = -h„i+l,mVm.+ieJn, SO that 

rk+m{t) = ~hm+l,m\u{t)]raVm+l, (5-12) 
13 



with [u{t)]ra being the last component of u{t). If V and H are obtained with the Sal 
Arnoldi/Lanczos process then (cf. (|2.8p ') 

VmHm - AVm = h„i+ljnl^^{I + "1 A)v,n+ie^H ^ , 

with all quantities defined by (|2.6p ~ p.7p (replacing the subindices -k by -m and adding 
the ^ sign) . This yields 

c- o-i 

7 [H,n^{t)\,n{I + -lA)v,n+l- (5.13) 

From (|5.12p and (|5.13p we see that the residual rk+m{t) is, just as in (|5.8p . a scalar 
time-dependent function times a constant vector. This shows that the derivation 
for rk+m{t) remains valid for all Krylov- Richardson iterations (formally, we can set 
yk{t) := yk+m{t) and repeat the iteration (|5.7p ). 

6. Numerical experiments. All our numerical experiments have been carried 
out with Matlab on a Linux and Mac PCs. Unless reported otherwise, in all ex- 
periments the initial vector v is taken to be the normalized vector with equal entries. 
Except Section [Ol in all the experiments the error reported is the relative error norm 
with respect to a reference solution computed by the EXPOKIT method [34]. The 
error reported for EXPOKIT is the error estimate provided by this code. 

6.1. Residual in Chebyshev iteration. The following tests are carried out for 
the Chebyshev iterative method with incorporated residual control (see Figure [5TT|) . 
We compute exp(— yl)w, where v € R" is a random vector with mean zero and standard 
deviation one. In the first test, the matrix A € M"'*" is diagonal with diagonal entries 
evenly distributed between —1 and 1. In the second test, we fill the first superdiagonal 
of A with ones, so that A becomes ill-conditioned. The plots of the error and residual 
norms are presented in Figures 16.11 and 16.21 

As can be expected, for nonnormal A the error accumulates during the iteration, 
so it is important to know when to stop the iteration. Too many iterations may yield 
a completely wrong answer. The residual sharply reflects the error behavior, thus 
providing a reliable error estimate. 

6.2. A convection-diffusion problem. In the next several numerical experi- 
ments the matrix A is taken to be the standard five-point central difference discretiza- 
tion of the following convection-diffusion operator acting on functions defined in the 
domain {x,y) G [0, 1]^: 

L[U] = -{DlU^)^ - {D2Uy)y + Pe{VlU^ + V2Uy), 

n/ ^ JlO' (x,2/) e [0.25,0.75]2, 1 
Di{x,y) = < . D2{x,y) = -Di{x,y), 

I 1 otherwise, 2 

vi{x,y)=x + y, V2{x,y)=x-y. 

To guarantee that the convection terms yield an exactly skew-symmetric matrix, be- 
fore discretizing we rewrite the convection terms in the form |24) 

ViU^ + V2Uy = ^{VlU^ + V2Uy) + ^((Wiu)^, -|- {v2u)y). 

This is possible because the velocity field (vi,V2) is divergence free. The operator L 
is set to satisfy the homogeneous Dirichlet boundary conditions. The discretization 
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Fig. 6.1. The residual and true error norms in the Chebyshev algorithm to compute ym ~ 
exp(— against iteration number m. Normal matrix A G R"^". Top: n = 10*. Bottom: 
n = IQS. 



is done on a 102 x 102 or 402 x 402 uniform mesh, producing an n x n matrix A of 



size n = 10orn = 16xl0, respectively. The Peclet number varies from Pe ■ 



convection, A = A 
A^h «8 X 10-4. 



) to Pe = 10 , which on the finer mesh means ||A — A \\i 



(no 

'\\A + 



6.3. Exponential Richardson iteration. In this section we apply the expo- 
nential Richardson iteration (15.21) . (|5.3p to compute the vector exp(— for the 
convection-diffusion matrices A described in Section 16.21 The mesh is taken to be 
102 X 102. As discussed above, to be able to update the residual and solve the 
IVP (|5.3p , we need to store the values of {t) for different t spanning the time inter- 
val of interest. Too few samples may result in an accuracy loss in the interpolation 
stage. On the other hand, it can be prohibitively expensive to store many samples. 
Therefore, in its current form, the method does not seem practical if a high accuracy 
is needed. On the other hand, it turns out that with relatively few samples (w 20) a 
moderate accuracy up to 10~^ can be reached. 
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Fig. 6.2. The residual and true error norms in the Chebyshev algorithm to compute ym ~ 
exp(— j4)i) against iteration number m. Nonnormal matrix A S R"^". Top: n = 10*. Bottom: 
n = IQS. 



We organize the computations in the method as follows. The residual vector 
function rfe(t) is stored as 20 samples. At each iteration, the IVP (|5.3p is solved 
by the Matlab odel5s ODE solver, and the values of the right-hand side function 
—Mikit) + fk{t) are interpolated using the stored samples. The odel5s solver is run 
with tolerances determined by the final required accuracy and produces the solution 
efe(i) in the form of its twenty samples. Then, the solution and residual are updated 
according to (|5.2|) and (|5.5I) respectively. 

We have chosen M to be the tridiagonal part tridiag(A) of the matrix A. Table [01 
and Figure 16.31 contains results of the test runs. Except the Richardson method, 
as a reference we use the EXPOKIT code [31] with the maximal Krylov dimension 
100. Note that EXPOKIT provides a much accurate solution than requested by the 
tolerance toler = 10"''. 

It is rather difficult to compare the total computational work of the EXPOKIT 
and Richardson methods exactly. We restrict ourselves to the matrix- vector part of the 
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Table 6.1 

Performance of the exponential Richardson method for the convection- diffusion test problem, 
toler = 10-*, M = tridiag(A). The CPU times are measured on a 3GHz Linux PC. We emphasize 
that the CPU time measurements are made in Matlab and thus are only an approximate indication 
of the actual performance. 
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error 




CPU time, s 


A 1 steps 
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Fig. 6.3. Convergence history of the exponential Richardson iteration 



work. In the Richardson method this work consists of the matrix- vector muhiphcation 
(matvec) with M — A in (15.51) and the work done by the odel5s solver. The matvec 
with bidiagonal M — A costs about 3n flops times 20 samples, in total 60n flop^. The 
linear algebra work in odel5s is essentially tridiagonal matvecs, LU factorizations 
and back/forward substitutions with (possibly shifted and scaled) M. According 
to TJ", Section 4.3.1], tridiagonal LU factorization, back- and forward substitution 
require about 2n flops each. A matvec with tridiagonal M is 5n flops. Thus, in total 
exponential Richardson costs 60n flops times the number of iterations plus 2n flops 
times the number of LU factorizations and back/forward substitutions plus 5n flops 
times the total number of ODE solver steps. The matvec work in EXPOKIT consists 
of matvecs with pentadiagonal A, which is about 9n flops. 

From Table 16.11 we see that exponential Richardson is approximately twice as 



^We use definition of flop from |13l Section 1.2.4]. 
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Fig. 6.4. Convergence of the conventional Arnoldi method with two existing stopping criteria 
and Krylov- Richardson with the residual-based stopping criterion for tolerance toler = 10~^. Left: 
stagnation-based criterion, Arnoldi stops too early (201 matvecs, 2.6 s CPU time, error l.Oe—03). 
Right: generalized residual criterion, Arnoldi stops too late (487 matvecs, 139 s CPU time, error 
4-9e—08). Parameters of the Krylov- Richardson run for both plots: 434 matvecs, 11 s CPU time, 
error 2.2e—06). The CPU measurements (on a 3 GHz Linux PC) are made in Matlab and thus are 
only an indication of the actual performance. 




# matvecs # matvecs 



Fig. 6.5. Convergence plots of the Arnoldi/ Lanczos and the new Krylov- Richardson methods, 
mesh 102 X 102, Pe = 100. Left: restart every 15 steps, right: Sal strategy with CMRES. The peaks 
in the residual plots on the left correspond to the restarts. 



cheap as EXPOKIT. As expected from the convergence estimates, exponential Ri- 
chardson converges much faster than the conventional Richardson iteration for solving 
a linear system Ax = v would do. For these A and w, 8-9 iterations of the conventional 
Richardson would only give a residual reduction by a factor of ~ 0.99. 

6.4. Experiments with Krylov-Richardson iteration. In this section we 
present some numerical experiments with the Krylov-Richardson method presented 
in Section lS^ We now briefly describe the other methods to which Krylov-Richardson 
is compared. 

Together with the classical Arnoldi/Lanczos method [TTJ [311 IHl |T7] , we have tested 
the Sal method of Van den Eshof and Hochbruck [37]. We have implemented the 
method exactly as described in their paper, with a single modification. In particular, 
as advised by the authors, in all the tests the shift parameter 7 is set to O.lfend and the 
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relaxed stopping criterion strategy for the inner iterative Sal solvers is employed. The 
only thing we have changed is the stopping criterion of the outer Krylov process. To 
be able to exactly compare the computational work, we can switch from the stopping 
criterion of Van den Eshof and Hochbruck (based on iteration stagnation) to the 
residual stopping criterion (see Lemma \2.2^ . Note that the relaxed strategy for the 
inner Sal solver is then also based on the residual norm and not on the error estimate. 

Since the Krylov-Richardson method is essentially a restarting technique, it has 
to be compared with other existing restarting techniques. Note that a number of 
restarting strategies have recently been developed [3S1 [U [TH |51 [23] ■ We have used 
the restarting method described in [29 . This choice is motivated by the fact that the 
method from ^29J turns out to be algorithmically very close to our Krylov-Richardson 
method. In fact, the only essential difference is handling of the projected problem. In 
the method [29] the projected matrix Hk built up at every restart is appended to a 
larger matrix H^+k- There, the projected matrices from each restart are accumulated. 
Thus, if 10 restarts of 20 steps are done, we have to compute the matrix exponential 
of a 200 X 200 matrix. In our method, the projected matrices are not accumulated, 
so at every restart we deal with a 20 x 20 matrix. The price to pay, however, is the 
solution of the small IVP ((glB . 

In our implementation, at each Krylov-Richardson iteration the IVP (|5.10p is 
solved by the odel5s ODE solver from Matlab. To save computational work, it is 
essential that the solver be called most of the time with a relaxed tolerance (in our 
code we set the tolerance to 1% of the current residual norm). This is sufficient to 
estimate the residual accurately. Only when the actual solution update takes place 
(see formula (|5.7p ) do we solve the projected IVP to a full accuracy. 

Since the residual time dependence in Krylov-Richardson is given by a scalar func- 
tion, little storage is needed for the look-up table. Based on the required accuracy, the 
ode 15s solver automatically determines how many samples need to be stored (in our 
experiments this usually did not exceed 300). This happens at the end of each restart 
or when the stopping criterion is satisfied. Further savings in computational work 
can be achieved by a polynomial fitting: at each restart the newly computed values of 
the function "0^ (see (|5.8|) ) are approximated by a best- fit polynomial of a moderate 
degree (in all experiments the degree was set to 6). If the fitting error is too large (this 
depends on the required tolerance), the algorithm proceeds as before. Otherwise, the 
ijjk function is replaced by its best-fit polynomial. This allows a faster solution of the 
projected IVP (|5.10p through an explicit formula containing the functions 

<Pk-l{x) - <y5fc-l(0) ^ 
X 

We now present an experiment showing the importance of a proper stopping crite- 
rion. We compute exp(— 5A)w for A being the convection-diffusion operator discretized 
on a uniform mesh 102 x 102 with Pe = 100. The tolerance is set to toler = 10~^. 
We let the usual Arnoldi method, restarted every 100 steps, run with the stagnation- 
based stopping criterion of [37] and with the stopping criterion of [H] based on the 
generalized residual (|2.5p . We emphasize that the stagnation-based stopping criterion 
of |37j is proposed for the Arnoldi method with Sal strategy and it works, in our 
limited experience, very well as soon as Sal is employed. However, the stagnation- 
based stopping criteria are used in Krylov methods not only with Sal (see e.g. [3]) 
and it is instructive to see possible implications of it. Together with Arnoldi, the 
Krylov-Richardson method is run with the residual-based stopping criterion. The 
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Table 6.2 

Results of the test runs of the Krylov- Richardson and Arnoldi with the residual-based stopping 
criterion. The CPU times are measured on a 2GHz Mac PC (mesh 102 X 102 j and on a 3GHz 
Linux PC (mesh 402 X 402 j. We emphasize that the CPU time measurements are made in Matlab 
and thus are only an approximate indication of the actual performance. 
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convergence plots are shown in Figure |631 As we see, both existing stopping criteria 
turn out to be far from optimal in this test. With the residual-based stopping cri- 
terion, the Arnoldi method required 438 matvecs and 78 s CPU time to obtain an 
adequate accuracy of 4.5e— 7. 

To facilitate a fare comparison between the conventional Arnoldi and the Krylov- 
Richardson methods, in all the other tests we use the residual-based stopping criterion 
for both methods. Table 16.21 and Figures 16.51 contain the results of the test runs to 
compute exp{—A)v for tolerance toler = 10~*. We show the results on two meshes 
for two different Peclet numbers only, the results for other Peclet numbers are quite 
similar. 

The first observation we make is that the CPU times measured in Matlab seem to 
favor the EXPOKIT code, disregarding the actual matvec values. We emphasize that 
when the Sal strategy is not used, the main computational cost in all the three meth- 
ods, EXPOKIT, Arnoldi and Krylov-Richardson, are k steps of the Arnoldi/Lanczos 
process. The differences among the three methods correspond to the rest of the com- 
putational work, which is 0{k^), if at least if not too many restarts are made. 

The second observation is that the convergence of the Krylov-Richardson itera- 
tion is essentially the same as of the classical Arnoldi/Lanczos method. This is not 
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influenced by the restart value or by the Sal strategy. Theoretically, this is to be 
expected: the former method applies Krylov for the Lp function, the latter for the 
exponential; for both functions similar convergence estimates hold, though they are 
slightly more favorable for the ip function [TT] . 

When no Sal strategy is applied, the gain we have with Krylov-Richardson is 
two-fold. First, a projected problem of much smaller size has to be solved. This is 
reflected by the difference in the CPU times of Arnoldi and Krylov-Richardson with 
restart 15 in lines 2 and 3 of the Table: 26.4 s and 6.7 s. Of course, this effect can 
be less pronounced for larger problems or on faster computers — see the corresponding 
lines for Arnoldi and Krylov-Richardson with restart 15 on a finer mesh. Second, we 
have some freedom in choosing the initial vector (in standard Arnoldi/Lanczos we 
must always start with v). This freedom is not complete because the residual of the 
initial guess has to have scalar dependence on time. Several variants for choosing the 
initial vector exist, and we will explore these possibilities in the future. 

A significant reduction in total computational work can be achieved when Krylov- 
Richardson is combined with the Sal strategy. The gain is then due to the reduction 
in the number of the inner iterations (the number of outer iterative steps is approxi- 
mately the same). In our limited experience, this is not always the case but typically 
takes place when, for instance, v and A represent discretizations of a smooth func- 
tion and a smooth partial differential operator, respectively. Currently, we do not 
completely understand this behavior. Apparently, the Krylov subspace vectors built 
in the Krylov-Richardson method constitute more favorable right-hand sides for the 
inner Sal solvers to converge. It is rather difficult to analyze this phenomenon, but 
we will try to do this in the near future. 

6.5. Initial vector and Krylov subspace convergence. It is instructive to 
observe dependence of the Krylov subspace methods on the initial vector v. In par- 
ticular, if (|1.3p stems from an initial-boundary-value problem (IBVP) and A is a 
discretized partial differential operator, a faster convergence may take place for v sat- 
isfying boundary conditions of the problem. Note that for the convection-diffusion 
test problem from the previous section this effect is not pronounced {v did not sat- 
isfy boundary conditions), probably due to the jump in the diffusion coefficients. We 
therefore demonstrate this effect on a simple IBVP 

Wf = Aw, M(a;, y,z,0) = uo(x,y, z), (6.1) 

posed for {x,y,z) G [0,1]^ for unknown function u{x,y, z,t) obeying periodic bound- 
ary conditions. We use a fourth-order finite volume discretization in space from [41] 
on a regular mesh 40 x 40 x 40 and arrive at I VP (|1.3p which we solve for t = 1000 
by computing exp{—tA)v. In Figure convergence of the Krylov-Richardson and 
Arnoldi/Lanczos methods is illustrated for the starting vector v corresponding to 

uq{x, y, z) — sin(27rx) sin(27r?/) sin(27rz) -|- x{a — x)y{a — y)z{a — z), 

with a = 2 or a = 1. In both cases the restart value is set to 100. The second 
choice a = 1 (right plot) agrees with boundary conditions in the sense that uq can be 
periodically extended and leads to a faster convergence. The same effect is observed 
for the Krylov-Richardson and Arnoldi/Lanczos methods with the Sal strategy, with 
a reduction in the number of steps from 12 to 8 or 9. Remarkably, EXPOKIT(IOO) 
converges for both choices of v within the same number of steps, 306. Apparently, 
this is because EXPOKIT splits the given time interval [0,icnd], building for each 
subinterval a new Krylov subspace. 
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Fig. 6.6. Convergence plots of the Arnoldi/ Lanczos and the new Krylov- Richards on methods for 
the fourth- order finite volume discretization of the three-dimensional Laplacian with periodic bound- 
ary conditions. Left: the starting vector v disobeys the boundary conditions, right: v is consistent 
with the boundary conditions. 

7. Concluding remarks and an outlook to further research. The proposed 
residual notion appears to provide a reliable stopping criterion in the iterative methods 
for computing the matrix exponential. This is confirmed by the numerical tests and 
analysis. Furthermore, the residual concept seems to set up a whole framework for 
a new class of methods for evaluating the matrix exponential. Some basic methods 
of this class are proposed in this paper. Many new research questions arise. One of 
them is a comprehensive convergence analysis of the new exponential Richardson and 
Krylov-Richardson methods. Another interesting research direction is development 
of other residual-based iterative methods. In particular, one may ask whether the 
exponential Richardson method (|5.2p - (|5.3p can not be used as a preconditioncr for 
the Krylov-Richardson method (|5.7I) . We plan to address this question in future. 

Finally, an interesting question is whether the proposed residual notion can be 
extended to other matrix functions. This is possible once a residual equation can be 
identified, i.e. an equation such that the matrix function satisfies this equation (see 
Table [TTT|) . For example, if we are interested in computing the vector u — cos(A)w, for 
given A £ R"^" and v S M", then we may consider a vector function u{t) — cos{tA)v^ 
which is a solution of the IVP 

u"{t) = -A^u, u(0) = V, u'(0) = 0. 

Thus, for an approximate solution Uk{t) ~ u{t) satisfying the initial conditions, the 
residual can be introduced as 

rk{t) = ^A^uk{t)-ul{t). 
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